A fine-tuned version based on Qwen3-1.7B, which enhances mathematical reasoning ability through one-shot reinforcement learning and verifiable reward (RLVR) methods, and performs excellently in mathematical benchmark tests and coding tasks.
Large Language Model
Safetensors